AITopics | sequence transduction

Structured Reordering for Modeling Latent Alignments in Sequence Transduction

Neural Information Processing SystemsDec-24-2025, 06:48:52 GMT

Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing separable permutations. We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i.e., semantic parsing and machine translation).

modeling latent alignment, name change, structured reordering, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.60)

Add feedback

Appendix: Structured Reordering for Modeling Latent Alignments in Sequence Transduction

Neural Information Processing SystemsAug-15-2025, 02:20:02 GMT

WCFG to PCFG Conversion The algorithm of converting a WCFG to its equivalent PCFG is shown in Algorithm 1. Full proof of this equivalence can be found in Smith and Johnson [1]. Proof of the Dynamic Programming for Marginal Inference We prove the correctness of the dynamic programming algorithm for computing the marginal permutation matrix of separable permutations by induction as follows. As a base case, each word (i.e., segment with length 1) is associated with an identity permutation matrix 1 . In the structured reordering module, we compute the scores for BTG production rules using span 2 Figure 1: The detailed architecture of our seq2seq model for semantic parsing (view in color). First, the structured reordering module genearates a (relaxed) permutation matrix given the input utterrance.

module, permutation matrix, reordering module, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Structured Reordering for Modeling Latent Alignments in Sequence Transduction

Neural Information Processing SystemsOct-11-2024, 02:35:36 GMT

Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing separable permutations.

modeling latent alignment, sequence transduction, structured reordering, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.66)

Add feedback

GPT-3 101: a brief introduction

#artificialintelligenceAug-1-2020, 13:55:10 GMT

Let's start with the basics. GPT-3 stands for Generative Pretrained Transformer version 3, and it is a sequence transduction model. Simply put, sequence transduction is a technique that transforms an input sequence to an output sequence. GPT-3 is a language model, which means that, using sequence transduction, it can predict the likelihood of an output sequence given an input sequence. This can be used, for instance to predict which word makes the most sense given a text sequence.

gpt-3, large language model, machine learning, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GPT-3 101: a brief introduction

#artificialintelligenceJul-27-2020, 18:00:05 GMT

Let's start with the basics. GPT-3 stands for Generative Pretrained Transformer version 3, and it is a sequence transduction model. Simply put, sequence transduction is a technique that transforms an input sequence to an output sequence. GPT-3 is a language model, which means that, using sequence transduction, it can predict the likelihood of an output sequence given an input sequence. This can be used, for instance to predict which word makes the most sense given a text sequence.

gpt-3, large language model, machine learning, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sequence Transduction with Recurrent Neural Networks

Graves, Alex

arXiv.org Machine LearningNov-14-2012

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Machine Learning

1211.3711

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: